Skip to content

ANN_BENCH: CAGRA-HNSW build in managed memory#1058

Draft
achirkin wants to merge 10 commits intobranch-25.08from
achirkin-cagra-hnsw-managed
Draft

ANN_BENCH: CAGRA-HNSW build in managed memory#1058
achirkin wants to merge 10 commits intobranch-25.08from
achirkin-cagra-hnsw-managed

Conversation

@achirkin
Copy link
Contributor

This PR replaces the standard configured_raft_resources handle to a customized handle for CAGRA-HNSW benchmark.
This resource handle uses a single managed memory resource for all: RMM default memory resource, RAFT workspace resource, RAFT large workspace resource. For raft workspace resource, a pool is used as usual to speedup frequent allocations.

The rationale behind this change is to allow using all available GPU memory through all stages of CAGRA build.
Before this change, by default, we have a regular device memory pool for everything except the large allocations; the large_memory_resource uses the managed memory. The problem with this behavior is that this pool grows during internal IVF-PQ build/search (the whole IVF-PQ index is stored in there), but doesn't shrink back during the graph optimization stage. As a result, the large allocations during the optimization stage severely oversubscribe UVM and degrade performance to a complete halt.
With the new change, the RMM default memory resource is not a member of the pool. Hence the pool stays relatively small (limited be the workspace resource adapter). And even the small pool that is left can be paged out by UVM when it's not actively in use.

@achirkin achirkin self-assigned this Jun 27, 2025
@achirkin achirkin added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Jun 27, 2025
@copy-pr-bot
Copy link

copy-pr-bot bot commented Jun 27, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions github-actions bot added the cpp label Jun 27, 2025
@achirkin
Copy link
Contributor Author

/ok to test

@achirkin
Copy link
Contributor Author

achirkin commented Jul 7, 2025

/ok to test

@achirkin
Copy link
Contributor Author

/ok to test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cpp improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

Development

Successfully merging this pull request may close these issues.

2 participants